Ontology Based Pivoted normalization using Vector Based Approach for information Retrieval

نویسندگان

  • Vishal Jain
  • Mayank Singh
چکیده

Research Scholar, Computer Science and Engineering Department, Lingaya’s University, Faridabad Associate Professor, Computer Science and Engineering Department, Lingaya’s University, Faridabad [email protected], [email protected] ABSTRACT An ample amount of documents present on web puts the users in state of dilemma. Users get confused about relevance of documents. Relevance means how closely the given query matches large number of documents. Many information extraction techniques are used for extracting documents but they all are in vain. The paper deals with the problem of classification, analyzing and extraction of web documents by using one of information extraction methods called Ontology Based Web Content Mining Methodology. We have evaluated proposed methodology in two specific domainsweather domain (web pages containing information about weather forecasting and analysis) and Google TM collection (web pages containing news). The proposed methodology is procedural i.e. it follows finite number of steps that extracts relevant documents according to user’s query. It is based on principles of Data Mining for analyzing web data. Data Mining first adapts integration of data to generate warehouse. Then, it extracts useful information with the help of algorithm. The task of representing extracted documents is done by using Vector Based Statistical Approach that represents each document in set of Terms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Hong Kong Polytechnic University at the TREC 2004 Robust Track

In the robust track, we mainly tested our passage-based retrieval model with different passage sizes and weighting schemes. In our approach, we used two retrieval models, namely the 2-Poisson model using BM25 term weights and the vector space model (VSM) using adaptive pivoted unique document length normalization. Also, we utilize WordNet to re-weight some PRF terms and extract some context wor...

متن کامل

IIT TREC 2005: Genomics Track

For the TREC-2005 Genomics Track ad-hoc retrieval task, we report on the development of a scalable information retrieval engine based on a relational data model for the integration of structured data and text. Our objectives are to meet the need for the integrated search of heterogeneous data sets of biomedical literature and structured data found in biological databases, and to demonstrate the...

متن کامل

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

Semi-fuzzy quantifiers for information retrieval

Recent research on fuzzy quantification for information retrieval has proposed the application of semi-fuzzy quantifiers for improving query languages. Fuzzy quantified sentences are useful as they allow additional restrictions to be imposed on the retrieval process unlike more popular retrieval approaches, which lack the facility to accurately express information needs. For instance, fuzzy qua...

متن کامل

An Executive Approach Based On the Production of Fuzzy Ontology Using the Semantic Web Rule Language Method (SWRL)

Today, the need to deal with ambiguous information in semantic web languages is increasing. Ontology is an important part of the W3C standards for the semantic web, used to define a conceptual standard vocabulary for the exchange of data between systems, the provision of reusable databases, and the facilitation of collaboration across multiple systems. However, classical ontology is not enough ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.07384  شماره 

صفحات  -

تاریخ انتشار 2013